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Crowd & Prejudice: An Impossibility Theorem for Crowd Labelling without a Gold Standard 
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ABSTRACT 

A common use of crowd sourcing is to obtain labels for a 
dataset. Several algorithms have been proposed to iden- 
tify uninformative members of the crowd so that their 
labels can be disregarded and the cost of paying them 
avoided. One common motivation of these algorithms is 
to try and do without any initial set of trusted labeled 
data. We analyse this class of algorithms as mechanisms 
in a game-theoretic setting to understand the incentives 
they create for workers. We find an impossibility result 
that without any ground truth, and when workers have 
access to commonly shared 'prejudices' upon which they 
agree but are not informative of true labels, there is al- 
ways equilibria where all agents report the prejudice. A 
small amount amount of gold standard data is found to 
be sufficient to rule out these equilibria. 



INTRODUCTION 

For "the crowd" is untruth - Kierkegaard 

Precedent literature has proposed a large number of al- 
gorithms that take a set of data points labeled by a 
group of agents, and try to estimate both the reliabil- 
ity of agents. These algorithms can be divided into two 
sets: those that leverage a small amount of gold stan- 
dard (ground truth) data (Snow, O'Connor, Jurafsky & 
Ng 2008, Wauthier & Jordan 2011), and those that do 
not (Dekel & Shamir 2009 &, Raykar, Yu, Zhao, Jerebko, 
Florin, Hermosillo Valadez, Bogoni & May 2009, Raykar, 
Yu, Zhao, Valadez, Florin, Bogoni & Moy 2010, Ku- 
mar & Lease 2011, Dekel & Shamir 2009a, Yan, Rosales, 
Fung, Schmidt, Hermosillo, Bogoni, Mouy & Dy 2010). 
These algorithms that attempt to do without the need 
for gold standard, do so by using agreement among dif- 
ferent labellers as indicative of correctness of a label. 
This agreement is either at the level of how to label of a 
given datapoint, as in most cases; or in how features map 
to labels, as in (Dekel & Shamir 2009 &). To achieve this 
they place their trust on agents who provide labels that 
are consistent with the labels provided by other agents, 
or in the case where the same datapoint is not labeled 
twice, where the proposed feature to label mapping is 
consistent with other agents' mapping of features to la- 
bels. It is often the case that labellers want to be seen as 
informed by those who are collecting the labels, as the 
labelling tasks soften pay and it is natural for those col- 
lecting the data to avoid the unnecessary cost of paying 
for labels from uninformed labellers. 

We analyse the class of algorithms that do not use gold 
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standard data as mechanisms in a game-theoretic set- 
ting, in order to understand the incentives they create 
for the agent providing the labels. We first present an 
impossibility result: that without gold label data, and 
when workers have access to commonly shared 'preju- 
dices' upon which they agree but which are not informa- 
tive of true labels, then there is always equilibria where 
the mechanism does not obtain the true labels from the 
informed workers, but rather all workers report the prej- 
udice. We then consider how a small amount of gold 
data is generally sufficient to render situations where the 
prejudice is reported by informed players as outside the 
equilibrium set. 

One possible criticism of our work is that there is little 
interest in pointing out that when the assumptions of 
a statistical model (in this case, that agreement among 
labellers indicates correctness) do not hold the conclu- 
sions drawn from such a model can be misleading. Our 
argument, however, is more subtle than this: the incen- 
tives created by the natural applications of the model 
in its intended task undermine the very assumptions of 
the model, by creating incentives for players to agree on 
the labelling with others, irrespective of whether they 
believe these to be the true labels. 

To make the situation we have in mind more concrete 
and to clarify how it defers from standard information 
cascades studied in economics, consider the hypothetical 
example of a professor who assigns their teaching assis- 
tants to grade exams, without grading any themselves. 
The TAs may or may not know the topic at hand (be 
informed or uninformed) and they must provide a grade 
(label) to each exam they are assigned. If the TAs each 
grade a question on each exam and do so sequentially, 
so that for each previous answer in a given exam they 
can observe the grades other TAs have assigned to them, 
what in economics is referred to as an information cas- 
cade can occur. TAs grading later questions can look at 
the grade a student received for initial exam questions, 
and guess that the question they where assigned will 
receive a similar grade, instead of having to understand 
the answer to the question the student gave and how it 
relates to the correct answer. In contrast, we study a re- 
lated but different situation, analogous to one, in which 
each TA grades (possibly overlapping) full exams, and 
they do so simultaneously without access to what the 
others are assigning. Note that if the TAs expect to be 
rewarded for agreement with others and if they believe 
others may use some prejudice to grade the exam, such 
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as assigning liighcr grades to, say, students with neat 
liand writing or who use longer words, they might be 
motivated to also used said prejudice. 

An equivalent example can be considered in a crowd 
sourcing context. Suppose we ask for translations of a 
given word in language A to speakers of language B; 
a common prejudice for speakers of language B would 
be that if a word in language A sounds like the word 
in language B it must translate to that word. Even if 
bilingual speakers of A and B are present in the worker 
pool, if they believe this, the consensus label will be 
the similar sounding (but possibly incorrect) translation 
they may choose to report this prejudice as to continue 
to be employed in the translation task. 

RELATED LITERATURE 

The study of learning algorithms from a mechanisms 
design perspective was initiated by (Dekel, Fischer 
et al. 2008) on a task they term "incentive compatible 
regression learning", where agents care about the func- 
tion that is learned and can report their observations to 
strategically manipulate it. In contrast to that model, 
our agents are not motivated to manipulate the learned 
function mapping examples to labels but are instead are 
motivated to be seen by the mechanism as capable and 
thus to continue to be employed as a source of labels 
for the task. In (Meir, Procaccia & Rosenschein 2010) 
a strategy-proof mechanisms where agents report labels 
and their objective is to maximize the accuracy of the 
learned classifier only on their subset of the data. 

Mechanisms designed to elicit subjective probabilities 
truthfully exploit richer action sets, where the action is 
not just reporting a label, but also reporting the distri- 
bution of labels the population will report. Examples of 
this type of mechanism is the Bayesian Truth Serum in- 
troduced in (Prelec 2004), and the extension of the Peer 
Prediction Method (Miller & Zeckhauser 2008) proposed 
by (Witkowski & Parkes 2011). 

Our 'prejudice' can be thought of as 'extrinsic random 
variables' which allow agents to coordinate their deci- 
sions, models for the equilibrium of these have been 
extensively studied in the economics literature. For 
a recent review of the literature see (Shell 2008), for 
experimental evidence for the laboratory see (Duffy & 
Fisher 2005) 

A rich literature on herding behaviour exists in eco- 
nomics, and is closely related to the model we exam- 
ined but in a sequential instead of simultaneous setting, 
thus the externality that encourages the herding is of 
an informational nature instead of in our case where 
it directly affects payoffs. When agents arrive exoge- 
nously ordered sequence and can observe previous agents 
choices and can follow these, they can either avoiding 
paying the cost of acquiring information about the pay- 
off of actions themselves or disregarding private infor- 
mation which they may posses, the classic papers in this 
stream are (Banerjee 1992, Bikhchandani, Hirshleifer & 



Welch 1992). Experimental laboratory studies have been 
carried out looking at herding and information cascades, 
both in the laboratory (Cipriani & Guarino 2005) and 
in the internet (Drehmann, Oechssler & Roider 2005) . 

For a recent multidisciplinary review of herding in 
humans from a cognitive neuroscience perspective see 
(Raafat, Chater & Frith 2009) 

SETTING 

Let 3^ = {1, . . . , K} a set of labels. We consider a game 
between the world, a mechanism M, and a set of agents 
A. Each agent a e A falls is of one of two types: in- 
formed or uninformed. We denote the set of informed 
agents by Ai C A. The goal of the mechanism is to 
identify which agents that are informed. The goal of the 
agents is to be identified as informed by the mechanism, 
even if they are not. 

Each game is determined by a distribution P over 
with the random variables {Y,U,I) ^ P. Letting /(•; •) 
denote mutual information, we require that P satisfy 
three conditions: 



IiY;U) = (1) 



I{Y;I)>0 (2) 



P{Y)^P{U) (3) 

We note that conditions 1 and 2 are equivalent to re- 
quiring P{Y,U,r) = P(U)P{Y)P{I\Y) and P{YJ) ^ 
P{Y)P{I), respectively. Intuitively, Y is to be inter- 
preted as the "true" label the mechanism is trying to 
learn, U is some uninformative signal about Y that has 
a different distribution to Y, and / is an informative 
signal about Y. It is assumed that P{Y) is common 
knowledge to the mechanism and all the agents. 

The game is played by the world first secretly drawing 
y ~ P{Y). Every agent a G A then receives an i.i.d. 
draw Ua from P{U\Y = y) = P{U). In addition, in- 
formed agents receive a draw ia from P{I\Y = y). The 
agents then each decide to report some j/a £ 3^ to the 
mechanism and from this the mechanism must try to 
determine which agents are informed and which are not. 

Strategically, uninformed agents have two choices: they 
can play a prejudiced strategy and report ya = Ua, or 
they can randomise and draw a new ya from P{Y). In- 
formed agents can also play prejudice or randomise but, 
in addition, can also play a truthful strategy and report 
ya = ia- The decision for an agent to be truthful, prej- 
udiced, or random depends on the mechanism. An in- 
formed agent may strategically decide to not be truthful 
in order to maximise its chances of being identified as 
informed. 
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For our purposes, a rnochanisrn is a function from a set 
of reports R = {j/q : a G A} to set of agents Am '!= A 
that the mechanism identifies as informed and truthful. 
That is, M : ^'•^ ^ 2-^. The goal of M is to maximise 
the probability that Am coincides with Ai^t the set of 
informed agents. It suffices for M to ensure that 



Pij = P{aeAM\aeAi,T)>^ (4) 



pu,u = P{aiAM\a^Ai,T)> ^ (5) 

since repeated independent samples will guarantee that 
( i^^Y ^ oo and f -PIML^Y as the number of 
labeled examples n goes to infinity. 

RESULTS 

We consider the class of mechanisms satisfying the above 
which succeed when a majority of players are informed 
and their reports truthful and uninformed players ran- 
domise. All proposed algorithms in the literattire, to our 
knowledge, satisfy this elementary criterion. 

Equilibria 

Three Bayes-Nash equilibria of the game induced by 
these mechanisms are: 

All randomise. When all other agents are randomiz- 
ing, an agent is indifferent among all labels and thus 
also about randomising over them. This is not a par- 
ticularly robust equilibrium as deviation of two agents 
to prejudice is sufficient to cause all others players 
to deviate it, and a deviation of two informed agents 
to truthfulness is also sufficient to cause all other in- 
formed agents to improve their payoff by deviating to 
truthfulness. 

Informed are truthful and uninformed randomise. 

In this equilibrium the mechanism by and large works 
in the sense that the set of players that acts consis- 
tently in the same manner as the set of informed and 
truthful agents. 

Both play prejudice. When all other plays play prej- 
udice an agent's probability of having their labels 
found to coincide with others is maximized by playing 
prejudice, rewardless of whether they are informed or 
uninformed. 

An interesting open question is how to select among 
these equilibria; while the fragility of the equilibrium 
where all agents randomize irrespective of type makes 
it an unlikely candidate it is unclear how to select 
among the other two. We conjecture that given equal 
sized populations of informed and uninformed agents 
the equilibrium selected will depend on the relative en- 
tropy of the prejudice and informed distribution, with 
the lower entropy distribution being more likely to be 
selected. 



Impossibility 

Any mechanism in the class defined above must fail in 
some equilibrium and for some distribution. Consider a 
situation where all agents are informed and play truth- 
fully: the mechanisms succeeds if and only if it identifies 
all agents as informed and truthful. Now consider a new 
situation, one in which the equilibrium where all agents 
play the prejudice occurs, and the distribution of the 
prejudice in this situation is identical to the distribution 
of the truth in the previous situation. Since the play 
observed in both games is identical from the perspec- 
tive of the mechanism, it must designate all agents as 
informed and truthful in the second situation and fail, 
or it must have identified some agents as prejudiced in 
the first situation and fail. 

Using a gold standard 

A mechanism that has access to sufficient gold standard 
data (a sample from the informed distribution) is enough 
for the impossibility result to no longer apply. In the sit- 
uation where agents play the prejudice with high proba- 
bility, the labels reported by those who are playing prej- 
udice will contradict the labels the mechanism has access 
to and thus can be identified. The mechanism needs ac- 
cess to enough labels from the informative distribution 
to identify either an agent that is playing prejudice or 
one that is playing truthfully. It can then extrapolate to 
agents who have labeled points which it did not originally 
have labels for based on their agreement or disagreement 
with those agents it previously identified. This new set 
of players that has then been identified can be used to 
extrapolate the the players who labelled points in com- 
mon with them, and this procedure can be repeated until 
all players have been identified (this requires that there 
is sufficient overlap between the data points labeled by 
agents). The same logic can be adapted to algorithms 
that do not have players label points in common such as 
(Dekel & Shamir 20096) but the overlap then applies to 
how features map to labels instead of how labels map to 
points. 

Interestingly, access to the prejudice is also sufficient to 
for the mechanism to succeed in that situation, as this 
can also be used to identify those who are playing prej- 
udice with high probability when they overlap with the 
points to which the mechanism has access to the preju- 
dice. The same overlapping procedure can then be used 
to reveal the strategies of the other players. 

Since randomising provides the highest entropy to the se- 
quence of play and thus is the hardest to distinguish from 
the true labels that a uninformed player can generate the 
mechanism can identify players who play the prejudice 
in this situation faster than those that are randomiz- 
ing. Thus, there is no equilibrium where the mechanism 
uses the gold data where agents play prejudice. The 
gold data also guarantees that informed agents have a 
dominant strategy to be truthful. This implies the only 
equilibrium has informed agents playing truthfully and 
uninformed agents randomizing. 
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CONCLUSION 

We consider algorithms that attempt to distinguish be- 
tween informed and uninformed workers without using 
gold standard data, and show that when these algo- 
rithms are analyzed as mechanisms they can lead to 
equilibria where no agents truthfully reveal their private 
information about the label if they have access to it but 
rather report labels that are uninformative of the true la- 
bel, but on which they can coordinate with other agents. 
In future research experimental work to identify wether 
the equilibria identified in the theoretical model occur, 
and to test theories of equilibrium selection if they do, 
would be extremely interesting. 
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